Elemental confounds
Four elemental confounds
Start at treatment (X)
Look for any arrows coming INTO X
Follow all possible paths to outcome (Y)
A valid adjustment set blocks all backdoor paths
But be careful not to control for colliders!
Copy this code from the slides or the class book (Simulation 1: Simple Confounding).
Run the base simulation and observe results
Modify the simulation parameters:
#number of sims
N = 1000
# Generate data
U <- rnorm(N) # Unobserved confounder
X <- rnorm(N, mean = 0.5 * U) # Treatment affected by U
Y <- rnorm(N, mean = 0.8 * U) # Outcome affected by U
Z <- rnorm(N, mean = 0.6 * U) # Observed variable that captures U
d <- data.frame(X, Y, Z)
# Fit models
flist1 <- alist(
Y ~ dnorm(mu, sigma),
mu <- a + bX*X,
a ~ dnorm(0, .5),
bX ~ dnorm(0, .25),
sigma ~ dexp(1)
)
m32.1 <- quap(flist1, d)
precis(m32.1) mean sd 5.5% 94.5%
a 0.05213487 0.03786007 -0.008372835 0.1126426
bX 0.36109766 0.03320060 0.308036688 0.4141586
sigma 1.20068648 0.02682494 1.157815048 1.2435579
mean sd 5.5% 94.5%
a 0.04297999 0.03709494 -0.01630488 0.1022649
bX 0.30773684 0.03352066 0.25416435 0.3613093
bZ 0.21409294 0.03254525 0.16207934 0.2661065
sigma 1.17545191 0.02626064 1.13348234 1.2174215
“Bad controls” can create bias in three main ways:
Warning signs of bad controls:
Modify your code for this new simulation (precision parasite):
Test your previous models with these new data, using different sample sizes (n = 50, 100, 1000). For each sample size, compare:
How does sample size affect the impact of the precision parasite? Under what conditions is the precision loss most severe?
Modify your code for this new simulation (bias amplification):
Compare different confounder strengths (0.5, 1, 2).
Questions: * What happens to the bias when you control for Z? * How does the strength of the confounding affect the amount of bias amplification? * Can you explain why this happens using the DAG?
Modify your code to create a scenario with both a precision parasite variable and a bias amplification variable.
Questions: * What happens to our estimates when we control for both variables? * Is it better to: * Control for neither * Control for just one (which one?) Control for both How can we use DAGs to decide which controls to include?